Cache memory arrangement and methods for use in a cache memory system

ABSTRACT

An arrangement and methods for operation in a cache memory system to facitate re-synchronising non-volatile cache memories ( 150 B,  160 B) following interruption in communication. A primary adapter ( 150 ) creates a non-volatile record ( 150 C) of each cache update before it is applied to either cache. Each such record is cleared when the primary adapter knows that the cache update has been applied to both adapters&#39; caches. In the event of a reset or other failure, the primary adapter can read the non-volatile list of transfers which were ongoing. For each entry in this list, the primary adapter negotiates with the secondary adapter ( 160 ) and transfers only the data which may be different.  
     The amount of data to be transferred between the adapters following reset/failure is generally much lower than under previous solutions, since the data to be transferred represents only the transactions which were in progress at the time of the reset or failure, rather than the entire non-volatile cache contents; also, new transactions need not be suspended while even this reduced resynchronisation takes place: all that is necessary is for the (relatively short) list of in-doubt quanta of data to be searched (if the transaction does not overlap any entries in this list then it need not be suspended; if it does overlap then the transaction may be queued until the resynchronisation completes).

FIELD OF THE INVENTION

[0001] This invention relates to fault-tolerant computing systems, andparticularly to storage networks with write data caching.

BACKGROUND OF THE INVENTION

[0002] In the field of this invention it is known that a storagesubsystem may include two (or more) adapters, each with a non-volatilewrite cache which is used to store data temporarily before it istransferred to a different resource (such as a disk drive).

[0003] When a write transaction is received on one adapter (the primaryadapter) the associated data is transferred to that adapter and storedin non-volatile memory. This data is also transferred to a secondadapter (the secondary adapter) and made non-volatile there too, toprovide fault-tolerance. When there is non-volatile data stored ineither adapter's cache, the resource is flagged as having data in acache.

[0004] Inherent in this process is a delay between the times when thedata is made non-volatile on the two adapters. If a reset or otherfailure of one or both adapters occurs during this delay, the twonon-volatile memory images may differ.

[0005] When the adapters subsequently restart operations, thenon-volatile memory images must be synchronised (i.e., made to containthe same contents). This is required for a number of reasons:

[0006] Either adapter could satisfy a Read transaction from its memoryimage and these Read transactions must receive consistent dataregardless of the receiving adapter.

[0007] Data present in one adapter and not the other may consume spaceon the first adapter indefinitely, thus resulting in a memory leak andreduced non-volatile capacity.

[0008] In earlier storage subsystem architecture this problem was solvedby:

[0009] Invalidating the secondary adapter's cache, Ÿ Flushing the entireprimary adapter's cache, and

[0010] Marking the resource as having no data in cache.

[0011] However, this approach has the disadvantage that all newtransactions may be suspended until this flushing operation completes(to avoid the complexity of managing new transactions in parallel withthe flushing operation). This can result in new transactions beingsuspended for many minutes, which is unacceptable in a high-availabilityfault-tolerant system. Furthermore, customer data is exposed to a singlepoint of failure while this flushing operation is in progress. Thesecondary adapter's cache must be invalidated before the primaryadapter's flush begins, in order to maintain data integrity: if theflush is interrupted (e.g., by a second reset of the primary adapter),the secondary adapter may subsequently flush different data to theresource. Two Read transactions, one before this second reset and oneafter, would return different data, resulting in a data miscompare.

[0012] Alternatively, new transactions may be allowed to proceed inparallel with the flushing operation, extending the time taken for theflushing operation. Using this approach, customer data is still exposedto a single point of failure during this, now slower, flushingoperation.

[0013] An alternative solution, for example known from U.S. Pat. No.5,761,705, is to:

[0014] Invalidate the secondary cache, and

[0015] Copy the entire primary adapter's cache to the secondaryadapter's cache.

[0016] This would not take as long as the first option, but still asignificant time. New transactions would be suspended during this time(unless significant additional complexity is accepted).

[0017] A variant of this alternative solution, for example known fromU.S. Pat. No. 5,724,501, is (in a first stage) to copy a metadata listand later (in a second stage) to copy the cache data.

[0018] A need therefore exists for re-synchronising a remote copy memoryimage following interruption in communication wherein the abovementioneddisadvantage(s) may be alleviated.

STATEMENT OF INVENTION

[0019] In accordance with a first aspect of the present invention thereis provided a cache memory arrangement, for use in a data storagesystem, as claimed in claim 1.

[0020] In accordance with a second aspect of the present invention thereis provided a method, for operation in a cache memory system, as claimedin claim 4.

[0021] In accordance with a third aspect of the present invention thereis provided a method, for operation in a cache memory system, as claimedin claim 7.

[0022] In a preferred form of the present invention, a primary adaptercreates a non-volatile record of each cache update before it is appliedto either cache. Each such record is cleared when the primary adapterknows that the cache update has been applied to both adapters' caches.

[0023] Consequently, the primary adapter has, at all times, anon-volatile list of all ongoing transfers.

[0024] In the event of a reset or other failure, the primary adapter canread the non-volatile list of transfers which were ongoing. For eachentry in this list, the primary adapter negotiates with a secondaryadapter and transfers only the data which may be different.

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] One method and arrangement for re-synchronising remote copymemory image following interruption in communication incorporating thepresent invention will now be described, by way of example only, withreference to the accompanying drawing(s), in which:

[0026]FIG. 1 shows a block schematic diagram illustrating a data storagesystem in which the present invention is used;

[0027]FIG. 2 shows a flow chart illustrating cache update process in thesystem of FIG. 1; and

[0028]FIG. 3 shows a flow chart illustrating recovery afterreset/failure process in the system of FIG. 1.

DESCRIPTION OF PREFERRED EMBODIMENT

[0029]FIG. 1 is a high level block diagram of a data processing system100, incorporating one or more processors (shown generally as 110), oneor more peripheral modules or devices (shown generally as 120) and adisk storage subsystem 130. The disk storage subsystem 130 includes adisk drive arrangement 140 (which may comprise one or more disk arraysof optical and/or magnetic disks), a first cache adapter 150 and asecond cache adapter 160. Each of the cache adaptors 150 and 160 has adynamic memory (150A and 160A respectively) and a non-volatile memory(150B and 160B respectively). Each adapter also includes a furthernon-volatile memory 150C, 160C respectively.

[0030] In use of the system 100, when a write transaction is received onone of the adapters 150 or 160 (the primary adapter) the associated datais transferred to that adapter and stored in non-volatile memory (150Bor 160B respectively). This data is also transferred to the otheradapter (the secondary adapter) and stored in non-volatile memory (160Bor 150B respectively) there too, to provide fault-tolerance. When thereis non-volatile data stored in either adapter's cache, the resource isflagged as having data in a cache.

[0031] Inherent in this process is a delay between the times when thedata is made non-volatile on the two adapters. If a reset or otherfailure of one or both adapters occurs during this delay, the twonon-volatile memory images may differ.

[0032] When the adapters subsequently restart operations, thenon-volatile memory images must be synchronised (i.e., made to containthe same contents). This is required for a number of reasons:

[0033] Either adapter could satisfy a Read transaction from its memoryimage and these Read transactions must receive consistent dataregardless of the receiving adapter.

[0034] Data present in one adapter and not the other may consume spaceon the first adapter indefinitely, thus resulting in a memory leak andreduced non-volatile capacity.

[0035] In order to satisfy this synchronization requirement, the system100 employs the following scheme.

[0036] As will be explained in greater detail below, the primary adapter(150 or 160) creates a non-volatile record (in non-volatile memory 150Cor 160C respectively) of each cache update before it is applied toeither cache's non-volatile memory 150B or 160B respectively. Each suchrecord is cleared when the primary adapter knows that the cache updatehas been applied to both adapters' non-volatile memories.

[0037] Consequently, the primary adapter has, at all times, anon-volatile list (in non-volatile memory 150C or 160C respectively) ofall ongoing transfers.

[0038] In the event of a reset or other failure, the primary adapterreads the non-volatile list of transfers which were ongoing. For eachentry in this list, the primary adapter negotiates with the secondaryadapter and transfers only the data which may be different.

[0039] Referring now to FIG. 2, the method for cache update employed inthe system 100 begins at step 210. Then, at step 220, in the primaryadapter, a non-volatile record (in non-volatile memory 150C or 160C) ofthe cache update is created before it is applied to either cache'snon-volatile memory 150B or 160B. Then, at step 230, the cache update isapplied to the primary adapter's non-volatile memory and to thesecondary adapter's non-volatile memory 150B and 160B. Then, at step230, in the primary adapter the non-volatile record (in memory 150C or160C) of the cache update is cleared. The cache update ends at step 250.

[0040] Referring now to FIG. 3, the method for recovery afterreset/failure employed in the system 100 begins at step 310. Then, atstep 320, in the primary adapter, the list (in the non-volatile memory)of transfers which were ongoing (uncompleted) at reset/failure is read.Then, at step 330, for each entry in list, the primary adapternegotiates with the secondary adapter and transfers to the secondaryadapter data (which may be different between the primary and secondaryadapters). The recovery after reset/failure ends at step 340.

[0041] It will be understood that the arrangement and method forre-synchronising remote copy memory image following interruption incommunication described above provides the following advantages:

[0042] The amount of data to be transferred between the adaptersfollowing reset or failure will be, in general, significantly lower thanunder previous solutions, since the data to be transferred representsonly the transactions which were in progress at the time of the reset orfailure, rather than the entire non-volatile cache contents; and

[0043] New transactions need not be suspended while even this reducedresynchronisation takes place: all that is necessary is for the(relatively short) list of in-doubt quanta of data to be searched. Ifthe transaction does not overlap any entries in this list then it neednot be suspended; if it does overlap then the transaction may be queueduntil the resynchronisation completes.

[0044] It will be appreciated that the methods described above for cacheupdate and for recovery after reset/failure in a data processing systemmay be carried out in software running on a processor (not shown), andthat the software may be provided as a computer program element carriedon any suitable data carrier (also not shown) such as a magnetic oroptical computer disc.

[0045] It will be appreciated that various modifications may be made tothe embodiments described above. For example, the non-volatile ‘list’memory (150C, 160C) described above as separate from the ‘main’non-volatile memory (150B, 160B) in each adapter may in practice beprovided within the non-volatile memory 150B or 160B of each adapter.Further modifications will be apparent to a person of ordinary skill inthe art.

What is claimed is:
 1. A cache memory arrangement for use in a data storage system, the arrangement comprising: first cache means having non-volatile memory means for storing a first copy of data; and second cache means having non-volatile memory means for storing a second copy of said data, and additional non-volatile memory means associated with at least one of the first cache means and the second cache means, the additional non-volatile memory means being arranged to hold a list of ongoing cache data storage transactions for which data storage in the non-volatile memory means of both the first and second cache means have not been completed, the list being arranged to be cleared of cache data storage transactions for which data storage in the non-volatile memory means of both the first and second cache means have been completed.
 2. The arrangement of claim 1 wherein the first and second cache means further have volatile memory means.
 3. A disk storage system comprising the arrangement of claim
 1. 4. A method for operation in a cache memory system including first cache means having non-volatile memory means for storing a first copy of data; and second cache means having non-volatile memory means for storing a second copy of said data, the method comprising: providing additional non-volatile memory means associated with at least one of the first cache means and the second cache means, storing in the additional non-volatile memory means a list of ongoing cache data storage transactions for which data storage in the non-volatile memory means of both the first and second cache means have not been completed, and removing from the list cache data storage transactions for which data storage in the non-volatile memory means of both the first and second cache means have been completed.
 5. The method of claim 4 wherein the first and second cache means further have volatile memory means.
 6. The method of claim 4 wherein the cache memory system is arranged to operate in a disk storage system.
 7. A method for operation in a cache memory system including first cache means having non-volatile memory means for storing a first copy of data, second cache means having non-volatile memory means for storing a second copy of said data, and additional non-volatile memory means associated with at least one of the first cache means and the second cache means for storing a list of ongoing cache data storage transactions for which data storage in the non-volatile memory means of both the first and second cache means have not been completed, the method comprising: re-synchronising the first and second cache means by: reading from the list stored in the additional non-volatile memory means; and for each transaction in the list, transferring data from the non-volatile memory means of one of the first and second cache means to the non-volatile memory means of the other of the first and second cache means.
 8. The method of claim 7 wherein the first and second cache means further have volatile memory means.
 9. The method of claim 7 wherein the cache memory system is arranged to operate in a disk storage system.
 10. A computer program element comprising computer program means for performing the method of claim
 4. 11. A computer program element comprising computer program means for performing the method of claim
 9. 